Zone classification in a document using the method of feature vector generation
نویسندگان
چکیده
A documenZ can be divided inio zones on the basia of ils content. For ezample, a zone can be either tezt or non-tezt. This paper describes an algorithm to classify each given document zone into one of nine diflerent classes. Features for each zone such as run length mean and variance, spatial mean and variance, fraction of the total number of black pizels in the zone, and the zone width ratio for each zone are e&acted. Run length related features are computed along four different canonical directiona. A decision tree classifier is used to assign a zone class on the basis of its feature vector. The performance on an independent
منابع مشابه
A New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier
With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملA Novel Noise-Robust Texture Classification Method Using Joint Multiscale LBP
In this paper we describe a novel noise-robust texture classification method using joint multiscale local binary pattern. The first step in texture classification is to describe the texture by extracting different features. So far, several methods have been developed for this topic, one of the most popular ones is Local Binary Pattern (LBP) method and its variants such as Completed Local Binary...
متن کاملFault Detection and Classification in Double-Circuit Transmission Line in Presence of TCSC Using Hybrid Intelligent Method
In this paper, an effective method for fault detection and classification in a double-circuit transmission line compensated with TCSC is proposed. The mutual coupling of parallel transmission lines and presence of TCSC affect the frequency content of the input signal of a distance relay and hence fault detection and fault classification face some challenges. One of the most effective methods fo...
متن کاملApplying Genetic Algorithm to EEG Signals for Feature Reduction in Mental Task Classification
Brain-Computer interface systems are a new mode of communication which provides a new path between brain and its surrounding by processing EEG signals measured in different mental states. Therefore, choosing suitable features is demanded for a good BCI communication. In this regard, one of the points to be considered is feature vector dimensionality. We present a method of feature reduction us...
متن کامل